Univariate Plots Section

## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 8.40 9.50 10.20 10.42 11.10 14.90
The lowest alcohol percentage a red wine had was 8.4%, and the highest was 14.9%. It seems that the distribution for alcohol percentage, by volume, is skewed right. There seems to be a very high count of red wines which are approximately 9.5% alcohol.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 2.740 3.210 3.310 3.311 3.400 4.010
The pH levels for the red wines seem to follow a normal distribution pattern. It seems there are a few possible outliers around 3.85 pH and 2.75 pH. Nothing too out of the ordinary.

I wanted to take a look at the distribution of quality ratings for the red wines to make sure nothing was out of the ordinary. This histogram gives us a good idea of whether the experts were being too harsh, or too easy. The normal distribution for quality ratings with no ratings below 3 and none above 8 gives me the reassurance that the experts were not being overly critical or easygoing.

I made a boxplot for pH so I could see if the outliers were having a significant effect on the distribution of the data; they don’t seem to be much of a problem based on the boxplot.

## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 4.60 7.10 7.90 8.32 9.20 15.90
The boxplot for fixed acidity seems to show there are a lot of outliers above 1.5 * IQR. It might be a good idea to see what it looks like with the outliers removed.
Most red wines have fixed acidity levels between 7.10 and 9.20. The IQR is 2.1. 1.5*IQR = 3.15. Removing the cases where the fixed acidity levels are above Q3+3.15 (outliers) might give us a better look at the distribution.

The distribution seems to look more normal now. With the knowledge that there are only outliers above 1.5IQR, perhaps we can take a look at the histogram. I would expect the distribution to look slightly skewed right, as it seems the there are a good number of outliers above Q3 + 1.5IQR.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

The prediction was correct!
Bivariate Plots Section

##
## Pearson's product-moment correlation
##
## data: rw$alcohol and rw$quality
## t = 21.639, df = 1597, p-value < 2.2e-16
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
## 0.4373540 0.5132081
## sample estimates:
## cor
## 0.4761663
It seems there is a moderate positive relationship between alcohol percentage and quality. I wanted to dig deeper and examine if the higher quality wines were focused around certain alcohol percentage levels.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

The distribution of alcohol percentage levels for red wines which were rated 7 is normal.

The distribution, for the most part, looks normal based on the small number of observations we have.

##
## Pearson's product-moment correlation
##
## data: rw$fixed.acidity and rw$quality
## t = 4.996, df = 1597, p-value = 6.496e-07
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
## 0.07548957 0.17202667
## sample estimates:
## cor
## 0.1240516
There doesn’t seem to any type of significant relationship between fixed acidity levels and quality based on the correlation coefficient of 0.124. The fixed acidity levels for red wines rated an 8 seem to be more focused within a certain range, around 7-10.5 g/dm^3. This might not be very telling, since again, there aren’t too many red wines with a rating of 8. I’m going to compare the boxplots for fixed acidity for wines of different quality ratings.
It seems that the wines of quality 7 had fixed acidity levels from around 7-10.5 g/dm^3. It might be a good idea to now examine the boxplots of lower quality wines next to these.Though 75% of the wines with rating 3 seemed to have acidity levels between 7.5-10 g/dm^3, it should be noted that the lower whisker for these wines is very short. The wines with rating 4 actually seemed to be more packed into a certain range: 7-8.5 g/dm^3.

##
## Pearson's product-moment correlation
##
## data: rw$residual.sugar and rw$quality
## t = 0.5488, df = 1597, p-value = 0.5832
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
## -0.03531327 0.06271056
## sample estimates:
## cor
## 0.01373164
Based on this scatterplot, it seems to me that wines with higher quality ratings did not have residual sugar levels above 9. This could be a sign that the fermentation process which results in the best quality red wines also keeps the residual sugar levels fairly and consistently low. However, it doesn’t look like there are too many wines with residual sugar levels above 9, even for red wines with quality ratings lower than 7. Interestingly enough, the wines with rating 3 also had no residual sugar levels above 9. This may just be a result of not enough observations for wines of those quality ratings. The correlation coefficient does not suggest a significant linear relationship exists between residual sugar levels and quality ratings.

##
## Pearson's product-moment correlation
##
## data: rw$sulphates and rw$quality
## t = 10.38, df = 1597, p-value < 2.2e-16
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
## 0.2049011 0.2967610
## sample estimates:
## cor
## 0.2513971
Again, the wines of highest quality ratings, 7 and 8, seem to have more consistency with their sulphate levels. Though it doesn’t seem like we have found any one property that will give us a good idea of the quality ratings, it might be true that higher quality wines tend to have more consistency. This makes sense, as you would expect the process of creating the highest quality wines to be very meticulous and careful: a craft of sorts. It seems reasonable to assume that the entire process for creating the higher quality wines leads to more consistency. Perhaps this is what I should be looking for in the next few plots. Being less careful with the process likely leads to lower quality wines and less consistency. It should be noted, however, that wines of fairly low quality with ratings 3 and 4 also seemed to have more consistency than wines with ratings 5 or 6. This, again, points to there actually just not being enough wines with ratings 3, 4, 7, or 8. The correlation coefficient is not significant.

##
## Pearson's product-moment correlation
##
## data: rw$density and rw$quality
## t = -7.0997, df = 1597, p-value = 1.875e-12
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
## -0.2220365 -0.1269870
## sample estimates:
## cor
## -0.1749192
It seems the lower quality wines, wines of rating 3, 4, and 5 tended to stay above certain density levels. They never strayed below 0.99250 g/cm^3. Wines of ratings 6, 7, and 8, however, all had instances where the density strayed below 0.99250. Again, the correlation coefficient did not point to a significant relationship between the variables.

##
## Pearson's product-moment correlation
##
## data: rw$chlorides and rw$quality
## t = -5.1948, df = 1597, p-value = 2.313e-07
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
## -0.17681041 -0.08039344
## sample estimates:
## cor
## -0.1289066
Here, the red wines of higher quality again seemed to be more consistent with their chloride levels (especially those with a rating of 8). Also, it does LOOK LIKE higher quality wines tended to have lower chloride levels.
However, the correlation coefficient does not indicate a significant relationship.
It shoud be noted, again, that our findings about consistency might be a result of there simply being more wines with ratings 5 and 6. We can do a quick check if this is the case with the table command.
##
## 3 4 5 6 7 8
## 10 53 681 638 199 18
Indeed, it seems that just was a lot more wines with ratings 5 or 6 versus wines rated 3, 4, 7, or 8, which is why it appeared that there was more consistency among the more highly rated wines.

##
## Pearson's product-moment correlation
##
## data: rw$free.sulfur.dioxide and rw$total.sulfur.dioxide
## t = 35.84, df = 1597, p-value < 2.2e-16
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
## 0.6395786 0.6939740
## sample estimates:
## cor
## 0.6676665
Not a very interesting relationship, but it does seem like the higher free sulfure dioxide levels is positively correlated with higher total sulfur dioxide levels. However, it does seem that a great proportion of the red wines have fairly low free and total sulfur dioxide levels, indicated by the dark blobs still present towards the bottom left of the plot.
It might be interesting to check for relationships between density and variables that might contribute to higher density, like presence of sulphates, residual sugars, chlorides, etc.

##
## Pearson's product-moment correlation
##
## data: rw$sulphates and rw$density
## t = 6.0012, df = 1597, p-value = 2.418e-09
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
## 0.1002148 0.1961000
## sample estimates:
## cor
## 0.1485064
It does not seem like the presence of higher sulphate levels contributed to higher density.

##
## Pearson's product-moment correlation
##
## data: rw$residual.sugar and rw$density
## t = 15.189, df = 1597, p-value < 2.2e-16
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
## 0.3116908 0.3973835
## sample estimates:
## cor
## 0.3552834
Based only on the plot, it didn’t seem to me like the presence of higher residual sugar levels had much of an effect on density. Actually, it turns out there is a moderate positive relationship between residual sugar levels and density, based on the returned correlation coefficient: 0.355.

##
## Pearson's product-moment correlation
##
## data: rw$alcohol and rw$density
## t = -22.838, df = 1597, p-value < 2.2e-16
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
## -0.5322547 -0.4583061
## sample estimates:
## cor
## -0.4961798
Interestingly, there does seem to be a relationship between alcohol percentage levels and density. It seems that as alcohol percentage levels increase, density decreases: these variables are negatively correlated. It might be a good idea to ask why this makes sense in the analysis.